Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamped tuning #130

Merged
merged 86 commits into from
Jan 23, 2025
Merged

Revamped tuning #130

merged 86 commits into from
Jan 23, 2025

Conversation

GardevoirX
Copy link
Contributor

@GardevoirX GardevoirX commented Dec 17, 2024

This PR introduces two things:

Still more works need to be done, like writing documentations, fixing the pytests and the example, before this PR is ready.

Contributor (creator of pull-request) checklist

  • Tests updated (for new features and bugfixes)?
  • Documentation updated (for new features)?
  • Issue referenced (for PRs that solve an issue)?

Reviewer checklist

  • CHANGELOG updated with public API or any other important changes?

📚 Documentation preview 📚: https://torch-pme--130.org.readthedocs.build/en/130/

@GardevoirX GardevoirX linked an issue Dec 17, 2024 that may be closed by this pull request
Comment on lines 22 to 24
def grid_search(
method: str,
charges: torch.Tensor,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would turn the logic around and keep the tune_XXX method. Also, grid_search is a very common name. It is not really clear from this that this will find the optimal parameters for the methods.

@@ -515,3 +518,82 @@ def forward(self, positions, cell, charges):
print(f"Evaluation time:\nPytorch: {time_python}ms\nJitted: {time_jit}ms")

# %%
# Other auto-differentiation ideas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO opinion I wouldn't put this example here - even though I think it is good to have it. The tutorial is already 500 lines and with this super long. I rather vote for smaller examples tackling one specific tasks. Finding solutions is much easier if they are shorter. See also the beloved matplotlib examples.

@GardevoirX GardevoirX marked this pull request as ready for review January 7, 2025 12:57
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there three Tuning base classes

TuningErrorBounds, TuningTimings which are hardcoded inside GridSearchBase? I think a single base class is enough, no?

src/torchpme/utils/tuning/__init__.py Outdated Show resolved Hide resolved
CalculatorClass = P3MCalculator
GridSearchParams = {
"interpolation_nodes": [2, 3, 4, 5],
"mesh_spacing": 1 / ((np.exp2(np.arange(2, 8)) - 1) / 2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we give the users the option to choose possibility to give the grid points on which they want to optimize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly because the possible grid points were hard-coded before. If we can do this, would be good. We let the user input a list of their desired mesh_spacing at the beginning?

)
value = result.sum()
if self._run_backward:
value.backward(retain_graph=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to retain the graph here?

positions.requires_grad_(True)
cell.requires_grad_(True)
charges.requires_grad_(True)
execution_time -= time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a very weird way of storing the result. why not using a temp variable?

Suggested change
execution_time -= time.time()
t0 = time.time()

See below.


if self._device is torch.device("cuda"):
torch.cuda.synchronize()
execution_time += time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
execution_time += time.time()
execution_time += t0 - time.time()

Comment on lines 142 to 49
self._charges = charges
self._cell = cell
self._positions = positions
self._dtype = charges.dtype
self._device = charges.device
self._n_repeat = n_repeat
self._n_warmup = n_warmup
self._run_backward = run_backward
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need all of these private properties?

Many of these seem to be only used once and are hardcoded.

Also I think user variables should be stored public.

If I pass positions I should be able to access them via self.positions and not as a private property.

src/torchpme/utils/tuning/__init__.py Outdated Show resolved Hide resolved
examples/10-tuning.py Show resolved Hide resolved
@GardevoirX GardevoirX force-pushed the revamped-tuning branch 2 times, most recently from a41f780 to 33c9705 Compare January 7, 2025 22:18
@PicoCentauri
Copy link
Contributor

Some notes from our meeting

  • give List[Dict] and class to the init of the tuner class where the dict can be a named or unnamed parameters passed to the init the calculator class.
  • class returns a List[floats] where floats are the obtained timings
  • user facing function i.e. tune_ewald will create the grid and returns the best parameters

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docstrings and tests

write a doc page API page explaining the base class and how we do the tuning (Reuse the text for updating the paper). In the API references I would do a new section tuning. On the tuning page I would explain how we do the tuning. Then create one page for each calculator and finally one page for the base classes. One the base class page you explain how you designed these classes and how they work together.

The subpages for each calculator should first display the tuning function and below the classes for the error bounds. In the introduction text of each display the equation for error bounds.

positions: torch.Tensor,
cutoff: float,
calculator,
params: list[dict],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need the exponent. Should be able to extract it from the calculator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the Potential does not necessarily have the attribute exponent, like CoulombPotential 🤔

src/torchpme/tuning/base.py Outdated Show resolved Hide resolved
Comment on lines 27 to 29
self._dtype = cell.dtype
self._device = cell.device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put the dtype here as argument or deduce it from the claculator.

What do you say @E-Rum ?

Comment on lines 49 to 50
@staticmethod
def _validate_parameters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now very similar to the one we use in the calculators, right?

Maybe we extact merge both and make them an standalone private function living in utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are still slightly different from each other. The one in the calculators checks smearing while the one of tuning checks exponent, but it is possible to extract the common part

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay yeah might be useful to have some code sharing if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting the standalone functions, do we only call it in the tuning functions, or we still keep it being called during the initialization of the tuner?

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am happy with the design. I left some initial comments but we can start making the code ready to go in.

docs/src/references/index.rst Outdated Show resolved Hide resolved
docs/src/references/tuning/base_classes.rst Show resolved Hide resolved
docs/src/references/tuning/tune_ewald.rst Show resolved Hide resolved
src/torchpme/tuning/error_bounds.py Outdated Show resolved Hide resolved
Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good progress. The overall structure looks convincing to me. I have one major question about the importance of the backward path in the tuning.

If we can convince me that this is important we can keep it. Otherwise I suggest only time the forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example lacks explanantions. Linking to the error formulas might be useful plus some more text between the cells explaining the last cells and the plans for the upcoming code.

Comment on lines 43 to 45
assert isinstance(
potential, Potential
), f"Potential must be an instance of Potential, got {type(potential)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather raise a ValueError here. Asserts are usually for testing and if a code should check something under all circumstances one should use a "real" error. See for example

https://stackoverflow.com/questions/40182944/whats-the-difference-between-raise-try-and-assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if TypeError is better? And what about those assertions below these lines?

assert self.dtype == self.potential.dtype, (
f"Potential and Calculator must have the same dtype, got {self.dtype} and "
f"{self.potential.dtype}"
)
assert self.device == self.potential.device, (
f"Potential and Calculator must have the same device, got {self.device} and "
f"{self.potential.device}"
)

r"""
Find the optimal parameters for :class:`torchpme.EwaldCalculator`.

The error formulas are given `online
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this anymore here. We give the equations in the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be nicer to let users know the origin of these equations in the documentation? Otherwise, for me if I am a newcomer, I might think you guys fabricated them 😂 Or we move them to where the equations are?

Comment on lines 101 to 114
Error bounds for :class:`torchpme.calculators.ewald.EwaldCalculator`.

The error formulas are given `online
<https://www2.icp.uni-stuttgart.de/~icp/mediawiki/images/4/4d/Script_Longrange_Interactions.pdf>`_
(now not available, need to be updated later). Note the difference notation between
the parameters in the reference and ours:

.. math::

\alpha &= \left( \sqrt{2}\,\mathrm{smearing} \right)^{-1}

K &= \frac{2 \pi}{\mathrm{lr\_wavelength}}

r_c &= \mathrm{cutoff}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring seems to be very similar to the tuning function.

Maybe write someting that this class implemenents the error bounds for the real and FT part of the ewald summation ...

src/torchpme/tuning/ewald.py Outdated Show resolved Hide resolved
src/torchpme/tuning/tuner.py Outdated Show resolved Hide resolved
src/torchpme/tuning/tuner.py Show resolved Hide resolved
src/torchpme/tuning/tuner.py Outdated Show resolved Hide resolved
positions.requires_grad_(True)
cell.requires_grad_(True)
charges.requires_grad_(True)
execution_time -= time.time()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use time.monotonic(). It seems to be better suited for the timings we do here.

See: https://docs.python.org/3/library/time.html#time.monotonic

Comment on lines 305 to 308
if self._run_backward:
positions.requires_grad_(True)
cell.requires_grad_(True)
charges.requires_grad_(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is necessary. Is this backward path that uncorrelated to the forward path. Naively I would imagine that if the forward path takes longer it is the same for the backward and this this does not include forces, I don't really see the point.

@PicoCentauri PicoCentauri mentioned this pull request Jan 22, 2025
4 tasks
@ceriottm
Copy link
Contributor

I'm done with a more-than-decent draft of the example. It explains well how the tuning is done, and then shows how to use the autotuner to also optimize the cutoff.

@PicoCentauri
Copy link
Contributor

Thanks, I take it from here.

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this large change @GardevoirX and @ceriottm !

Comment on lines 251 to 254
smearing = torch.as_tensor(smearing)
mesh_spacing = torch.as_tensor(mesh_spacing)
cutoff = torch.as_tensor(cutoff)
interpolation_nodes = torch.as_tensor(interpolation_nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using as_tensor instead of tensor?

... )

"""
_validate_parameters(charges, cell, positions, exponent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay makes sense.

@PicoCentauri PicoCentauri merged commit 9e8ece1 into main Jan 23, 2025
13 checks passed
@PicoCentauri PicoCentauri deleted the revamped-tuning branch January 23, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grid-searching based tuning scheme
3 participants